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Disclaimer 


THIS HAS NOTHING TO 
DO WITH WAVELETS! 


Indexed String Sequences 


| 


(foo, bar, foobar, foo, bar,| bar, foo) 
0 1 4 5 6 
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¢ Queries 

- Access(i): access the /--th element 
e Access(2) = foobar 

- Rank(s, pos): count occurrences of s before 

pos 

°Rank(bar, 5) = 2 

- Select(s, 1): find the -th occurrence of a s 
°Select(foo, 2) = 6 


Prefix operations 


foobar, foo, bar,|bar, foo) 
3 4 5 6 


* Queries 
- RankPrefix(p, pos): count strings 
prefixed by op before pos 
¢RankPrefix(foo, 5) = 3 
- SelectPrefix(p, 1): find the th string 
prefixed by po 
¢ SelectPrefix(foo, 2) = 3 


=> 


—> 
=> 
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Example: storing relations 


¢ Write the columns as string sequences 
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- Store them separately 


- Reduce relational operations to sequence 
queries 
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Other operations: 


Dynamic sequences 


We want to support the following operations: 
° Insert(s, pos): insert the string s 
immediately before position pos 


¢ Append(s): append the string s at end of the 
sequence (special case of Insert) 


¢ Delete(pos): delete the string at position pos 


lf data structure only supports Append, we call 
it aopend-only, otherwise dynamic (or fully 
dynamic) 


Requirements 


¢ Store the sequence in as little 
Space as possible 


- Close to the information-theoretic lower 
bound 


¢ But still be able to support all the 
described operations (query and 
update) efficiently 
- Aim for worst-case polylog operations 


Some notation 


(foo, bar, foobar, foo, bar, bar, foo) 
0 1 2 3 4 5 6 


¢ Sequence S, |S| =n 
- In the example n = 7 
¢ String set S.. is unordered set of distinct strings 
appearing in S 
-Inthe example, {foo, bar, foobar}, |S.«| = 3 
- Also called a/ohabet 
¢ Sequence symbols can also be integers, characters, 


- As long as they are binarized to strings 


Wavelet Trees 


¢ Introduced in 2003 to represent 
Compressed Suffix Arrays 

¢ Support Access/Rank/Select on 
sequences on a finite alphabet (of 
integers) 
- Reduces to operations on bitvectors by 

recursively partitioning the alphabet 

¢ String sequences can be reduced to 

integer sequences 


Wavelet Trees 


°S =(a,b,r, a,c, a, d, a, b, r, a), 
Sset= 1a, D, 5 
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{a, b} {c, d, r} 
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Wavelet Trees 


¢ Space equal to entropy of the 
sequence 
- Plus negligible terms 

¢ Supports Access/Rank/Select in O(log 
IS ceel) 


¢ Later extended to support 
Insert/Delete... 
- ,.. but tree structure is fixed a prior 
- String set S.., is cannot be 
changed! 


set 


The Wavelet Trie 


¢ The Wavelet Trie is a Wavelet Tree on 
sequences of binary strings (S.., ¢ 
10, 1}°) 

¢ Supports 
Access/Rank(Prefix)/Select(Prefix) 

¢ Fully dynamic... 

° ... Or append only (with better 
bounds) 

¢ The string set need not be known in 
advance 


Wavelet Trie: Construction 


Sequence of binary strings 
Branching bit: B a: 010 
1¢ 8: 1001011 


Common prefix: a 


Wavelet Trie: Construction 


a: 010 
B: 1001011 


010111 
0100110 
0100001 
0101010 
0100110 
010111 
010110 


Wavelet Trie: Access 


a: 010) 
B: 


0 010111 
1 0100110 
2 0100001 
3 0101010 
4 0100110 
5 010111 
6 010110 


Access(5) #10111 


Rank is similar 


Wavelet Trie: Select 


a: 010 


0 010111 
1 0100110 
2 0100001 
3 0101010 
4 0100110 
5 010111 
6 010110 


Select(0100110, 4 
1) = 


Wavelet Trie: Append 


a: 010 
B: 1001017. 


010111 
0100110 
0100001 
0101010 
0100110 
010111 
010110 
010010 


nsert/Delete ar 
similar 


Space analysis 


¢ Information-theoretic lower bound 
- LB(S) = LT(S.e) + NHo(S) 
- LT is the information-theoretic lower bound 
for storing a set of strings 


¢ Static WT: LB(S) + o(hn) 

¢ Append-only WT: LB(S) + PT(Sce-)+ o(hn) 
- PT(Scet): Soace taken by the Patricia Trie 

¢ Fully dynamic WT: LB(S) + PT(S¢et)+ 
O(NH,(S)) 


Operations time complexity 


¢ Need new dynamic bitvectors to support 

initialization (create a bitvector 0 or 1») 
¢ Static and Append-only Wavelet Trie 

- All Supported operations in O(|s| + h,) 

-h, is number of nodes traversed by string s 
¢ Fully dynamic Wavelet Trie 

- All Supported operations in O(|s| + h, log n) 


- Deletion may take O(|S] + h, log n) where § 
is longest string in the trie 


Thanks for your attention! 


